Goto

Collaborating Authors

 computational graph








5421e013565f7f1afa0cfe8ad87a99ab-AuthorFeedback.pdf

Neural Information Processing Systems

For3 now, we report total running times on the cross-validated computational graphs, for a diverse selection of datasets.4 We will augment this description with a detailed description in the12 supplementary. Missing values: We selected k-NN imputation because it arguably provides a stronger baseline than simple mean19 imputation (while being computationally more demanding). However, using EM as an inner loop within a structure search would be computationally quite21 demanding. Determining the computational graph isfarsimpler,and can be tackled with cross-validation30 (asinthispaper), orassuggested bythereviewer using AutoML techniques orneural structural search (NAS).



Dense Neural Networks are not Universal Approximators

Rauchwerger, Levi, Jegelka, Stefanie, Levie, Ron

arXiv.org Machine Learning

We investigate the approximation capabilities of dense neural networks. While universal approximation theorems establish that sufficiently large architectures can approximate arbitrary continuous functions if there are no restrictions on the weight values, we show that dense neural networks do not possess this universality. Our argument is based on a model compression approach, combining the weak regularity lemma with an interpretation of feedforward networks as message passing graph neural networks. We consider ReLU neural networks subject to natural constraints on weights and input and output dimensions, which model a notion of dense connectivity. Within this setting, we demonstrate the existence of Lipschitz continuous functions that cannot be approximated by such networks. This highlights intrinsic limitations of neural networks with dense layers and motivates the use of sparse connectivity as a necessary ingredient for achieving true universality.


f6185f0ef02dcaec414a3171cd01c697-Paper.pdf

Neural Information Processing Systems

Consider the problem of training deep neural networks on large annotated datasets, such as ImageNet [1]. This problem can be formalized as finding optimal parameters for a given neural networka,parameterized byw,w.r.t.